home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Tech Arsenal 1
/
Tech Arsenal (Arsenal Computer).ISO
/
tek-02
/
tsptp.zip
/
RESULTS
< prev
next >
Wrap
Text File
|
1993-04-09
|
25KB
|
515 lines
Benchmarking Pascal Compilers: TopSpeed v3.02 v Turbo v6.0
==========================================================
In response to messages on the BBS suggesting that TopSpeed Pascal may
generate slower code than Turbo Pascal, I have prepared a number of standard
benchmarks to test these claims. Benchmarks need careful interpretation
and the performance of a compiler in a particular benchmark does not
necessarily imply anything about the performance in a real world application.
Most benchmarks are very small programs that test a single aspect of a
compilers performance, typically in a very limited fashion. Modern
optimising compilers may perform very differently in large real world
applications than in trivial benchmarks.
The following text presents the results of my benchmarks together with a
brief description of the benchmark program and an interpretation of the
results. The benchmarks themselves are included with this paper.
The timing mechanism for these benchmarks is a little unusual and requires
some explanation. Each benchmark is run in a loop, this naturally incurrs
some overhead which must be accounted for. In order to calculate the
overhead a loop which calls a Dummy procedure is executed for 1 second.
The number of completed loops is recorded in NullLoops and the time taken is
recorded in NullTime. Then the actual benchmark is run in a loop which
invokes the benchmark AND the Dummy procedure. This loop is executed for
approximately 1 minute. The number of completed loops is recorded in
BenchLoops and the time taken is recorded in BenchTime. Now the loop
overhead for the latter loop is calculated using the expression:
LoopOverhead := (NullTime/NullLoops)*BenchLoops. The result is subracted
from BenchTime to give TotalTime, and LoopsPerSecond is calculated from
this as BenchLoops/TotalTime. Thus the greater the value of LoopsPerSecond
the faster the benchmark.
NOTE: The NOx87 benchmarks were run on a 25MHz 386 with no coprocessor. The
Turbo Pascal REAL type uses a non-standard 6-byte representation, this is the
only floating point type supported by the Turbo Emulator. The TopSpeed
Pascal REAL type uses an 8-byte representation, though the TopSpeed Emulator
also supports 4 and 10-byte floating point types with 10 byte intermediate
values. All these types are designed for compatibility with the formats used
by the Intel coprocessors. Because of the difference in representation it is
reasonable to expect TopSpeed Pascal floating point programs to acheive
better precision at the cost of some speed. Also the TopSpeed INTEGER type
uses a 4-byte representation whereas Turbo Pascal uses a 2-byte
representation. The EMU and x87 benchmarks were run on a 20MHz 486DX. You
should be careful when comparing results from different machines. For
comparative purposes the benchmarks use the types MyReal and MyInt which are
defined as follows:
TopSpeed Turbo
======== =====
MyReal (EMU & NOx87) REAL (8-byte) REAL (6-byte)
MyReal (x87) REAL (8-byte) DOUBLE (8-byte)
MyInt INTEGER (4-byte) LONGINT (4-byte)
x87 results are not shown for non-floating point benchmarks, instead the
x87 collumn contains "N/A".
The Turbo Pascal programs were compiled from the command line using the
following parameters:
(EMU & NOx87) /B /$A+ /$D- /$E+ /$N- /$G+ /$I- /$L- /$R- /$S- /$V-
(x87) /B /$A+ /$D- /$E- /$N+ /$G+ /$I- /$L- /$R- /$S- /$V-
The TopSpeed Pascal benchmarks were compiled using the following .PR files:
(EMU & NOx87) #system auto exe
#model small
#pragma optimize(cpu=>286)
#compile %main
#link %prjname
(x87) #system auto exe
#model small
#pragma optimize(cpu=>286, copro=>287)
#compile %main
#link %prjname
Both compilers wer instructed to produce code optimised for the 286 with
no run-time checks, optimised for speed.
Ackermann Benchmark
===================
The Ackermann Benchmark is a simple recursive benchmark for function call
overhead. TopSpeed Pascal passes parameters in registers, unlike most other
compilers, and can be expected to outperform other compilers in this
benchmark.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.04 1.05 N/A 1.04 1.04 N/A
BenchTime : 60.15 60.96 61.30 60.58
Null loops : 3276 2506 3648 2674
Bench loops : 63 57 45 35
LoopOverhead : 0.02 0.02 0.01 0.01
TotalTime : 60.13 60.94 61.29 60.57
Loops per second : 1.05 0.94 0.73 0.58
TopSpeed Pascal comes out as the clear winner in this benchmark illustrating
that the register parameter passing can have a marked effect. The effect is
greatest where there are a large number of calls to small functions or
procedures taking a number of parameters.
Dhrystone Benchmark (March 84), Version Pascal / 2
==================================================
This is a translation of the classic synthetic benchmark by Reinhold Weicker.
The Dhrystone was written to contain a sequence of statements of different
types that would closely match the proportions of these statements found in
a large sample of real programs. It is generally thought that this benchmark
should provide a good indication of a compilers performance in real world
applications. However many modern optimising compilers have been written to
do well in benchmarks such as the Dhrystone and so give misleading results.
On the other hand careful analysis of the resulting code can reveal a
compilers weaknesses and strengths. There are no floating point expressions
in the dhrystone making it a good test of the code generation capabilities of
the compiler. The dhrystone is particularly sensitive to string or array
handling optimisations.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.04 1.05 N/A 1.04 1.05 N/A
BenchTime : 60.03 60.03 60.03 60.03
Null loops : 3469 2569 3696 2746
Bench loops : 26653 18518 14654 13304
LoopOverhead : 7.99 7.57 4.12 5.09
TotalTime : 52.04 52.46 55.91 54.94
Loops per second : 512.17 352.98 262.12 242.14
TopSpeed Pascal performs almost twice as many dhrystone loops as Turbo.
This suggests that the optimisation performed by TopSpeed Pascal is giving
it an edge.
FBench Benchmark
================
This benchmark uses a complete optical ray-tracing algorithm and provides a
good indication of a compilers floating point performance and accuracy. The
benchmark can be very sensitive to the efficiency of the trigonometric
functions in the run-time library. The benchmark is also very sensitive to
errors in the calculations, though these results aren't displayed here
since both compilers fell within the tolerances of the benchmark.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.04 1.04 1.04 1.05 1.04 1.04
BenchTime : 60.09 60.03 60.04 60.03 60.03 60.03
Null loops : 3346 2519 2518 3665 2673 2757
Bench loops : 383 15375 15997 1159 1280 8538
LoopOverhead : 0.12 6.35 6.61 0.33 0.50 3.22
TotalTime : 59.97 53.68 53.43 59.70 59.53 56.81
Loops per second : 6.39 286.41 299.39 19.41 21.50 150.29
Turbo Pascal does exceptionally well in this benchmark where the emulator is
used. This is probably due to fact that Turbo Pascal uses a 6-byte
representation for REALs whereas TopSpeed Pascal uses an 8-byte
representation with a 10-byte internal representation within the emulator.
However TopSpeed Pascal does very much better when a coprocessor is present.
TopSpeed Pascal drives the chip in 'open mode' which tends to result in
exceptionally fast floating point code. NOTE: Turbo Pascal is still using
it's internal routines in EMU mode, rather than using the chip. This may
well be my fault, although it may just be failing to detect the onboard
coprocessor.
Fibonacci Benchmark
===================
The Fibonnaci benchmark is similar to the Ackermann benchmark, it is a highly
recursive benchmark useful for testing function call overhead.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.04 1.04 N/A 1.04 1.04 N/A
BenchTime : 60.20 61.85 60.64 60.25
Null loops : 3377 2480 3778 2637
Bench loops : 24 24 21 20
LoopOverhead : 0.01 0.01 0.01 0.01
TotalTime : 60.19 61.84 60.63 60.24
Loops per second : 0.40 0.39 0.35 0.33
TopSpeed Pascal comes out as the clear winner in this benchmark illustrating
that the register parameter passing can have a marked effect. The effect is
greatest where there are a large number of calls to small functions or
procedures taking a number of parameters.
Float Benchmark
===============
The Float Benchmark is a trivial floating point benchmark. This benchmark
has been popular for benchmarking C compilers and often gives some
indication of the efficiency of floating point expressions. However the
benchmark is prone to being optimised almost out of existence by a clever
optimiser. It is worth bearing in mind that trivial benchmarks can be prone
to being optimised out of existence, but real world applications don't
contain code of this nature unless they are poorly written. A compiler that
is clever enough to spot these cases is not necessarily better on a larger
scale. Many compiler implementors would rather write a code generator that
devotes its efforts to performing useful optimisations on realistic code
than correcting a programs design flaws.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.05 1.04 1.05 1.04 1.05 1.05
BenchTime : 66.24 60.09 60.03 66.46 61.90 60.36
Null loops : 3321 2522 2472 3724 2695 2778
Bench loops : 9 342 352 8 8 133
LoopOverhead : 0.00 0.14 0.15 0.00 0.00 0.05
TotalTime : 66.24 59.95 59.88 66.46 61.90 60.31
Loops per second : 0.14 5.70 5.88 0.12 0.13 2.21
TopSpeed demonstrates a slight advantage in this benchmark under the
emulator, suggesting a more efficient handling of floating point expressions
despite the larger representation. When the coprocessor is in use TopSpeeds
advantage doubles.
Gamm Benchmark
==============
The GAMM benchmark is a floating point benchmark that provides an indication
to the efficiency of floating point expressions. Unlike the Float benchmark
it is non-trivial and not subject to over-optimisation.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.04 1.04 1.04 1.04 1.04 1.05
BenchTime : 60.15 60.03 60.04 60.04 60.03 60.03
Null loops : 3365 2547 2577 3800 2752 2737
Bench loops : 192 10690 9031 959 1084 5928
LoopOverhead : 0.06 4.36 3.64 0.26 0.41 2.27
TotalTime : 60.09 55.67 56.40 59.78 59.62 57.76
Loops per second : 3.20 192.04 160.14 16.04 18.18 102.64
Turbo Pascal comes out the winner by a mile in this one under the emulator.
When the coprocessor is in use TopSpeed Pascal is the clear winner. This
benchmark again suggests that Turbo Pascal's internal floating point
representation gives it a clear advantage.
IntMath Benchmark
=================
The IntMath benchmark is a trivial benchmark that illustrates the efficiency
of integer expressions. It is unlikely to be over-optimised and so should
provide a pretty good idea of the compilers capabilities in a very particular
area.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.04 1.04 N/A 1.04 1.05 N/A
BenchTime : 60.09 60.04 61.74 61.02
Null loops : 3384 2505 3736 2697
Bench loops : 809 846 30 46
LoopOverhead : 0.25 0.35 0.01 0.02
TotalTime : 59.84 59.69 61.73 61.00
Loops per second : 13.52 14.17 0.49 0.75
Turbo Pascal executes this benchmark extremely slowly, suggesting thet
TopSpeed Pascals optimisation is having a great effect. Bear in mind
however that TopSpeed Pascal is using its natural INTEGER (4-byte) type
whereas Turbo Pascal is using LONGINTs (4-byte). The 4-byte representation
is not as efficient as Turbo Pascals 2-byte INTEGER but it is necessary to
use it for a fair comparison with TopSpeed.
RealMath Benchmark
==================
The RealMath banchmark is a trivial benchmark that illustrates the efficiency
of floating point expressions. It is unlikely to be over-optimised and so
should provide a pretty good idea of the compilers capabilities in a very
particular area.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.05 1.04 1.04 1.05 1.04 1.04
BenchTime : 60.03 60.04 60.03 60.03 60.15 60.04
Null loops : 3417 2510 2465 3731 2774 3779
Bench loops : 580 31335 32934 404 441 7654
LoopOverhead : 0.18 12.98 13.90 0.11 0.17 2.11
TotalTime : 59.85 47.06 46.13 59.92 59.98 57.93
Loops per second : 9.69 665.90 713.86 6.74 7.35 132.12
TopSpeed Pascal outperforms Tubo in this benchmark, this result appears to
contradict the result obtained from the GAMM.
Savage Benchmark
================
The Savage Benchmark illustrates the efficiency of the compilers
trigonometric functions.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.04 1.05 1.04 1.05 1.04 1.04
BenchTime : 61.30 60.25 60.03 64.53 63.71 60.42
Null loops : 3419 2467 2555 3759 2717 3817
Bench loops : 4 183 188 10 11 76
LoopOverhead : 0.00 0.08 0.08 0.00 0.00 0.02
TotalTime : 61.30 60.17 59.95 64.53 63.71 60.40
Loops per second : 0.07 3.04 3.14 0.15 0.17 1.26
Turbo Pascal does well in this benchmark under the emulator, again suggesting
that the non-standard 6-byte REALs are giving it an edge. Again TopSpeed
wins under the coprocessor.
Sieve Benchmark
===============
The Sieve is a classic benchmark that illustrates the efficiency of array
indexing and integer expressions.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.04 1.05 N/A 1.05 1.05 N/A
BenchTime : 60.04 60.03 60.09 60.03
Null loops : 3308 2514 3614 2775
Bench loops : 960 1206 422 467
LoopOverhead : 0.30 0.50 0.12 0.18
TotalTime : 59.74 59.53 59.97 59.85
Loops per second : 16.07 20.26 7.04 7.80
TopSpeed Pascal does extremely well in this which may suggest that TopSpeed's
4-byte INTEGER expressions are more efficient than Turbo's 4-byte LONGINTs.
Store Benchmark
===============
The Store Benchmark is a trivial test of the efficiency of a Pascal
implementations file IO. This can be an extremely misleading benchmark, it
may be affected by the form of buffering used, if any, also I/O checking
of various forms.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.04 1.04 N/A 1.04 1.04 N/A
BenchTime : 60.75 60.20 60.20 60.31
Null loops : 3407 2602 3712 3862
Bench loops : 40 34 69 59
LoopOverhead : 0.01 0.01 0.02 0.02
TotalTime : 60.74 60.19 60.18 60.29
Loops per second : 0.66 0.56 1.15 0.98
Turbo Pascal comes out better in this benchmark, however this might be due
to buffering, checks the IO routines perform (or don't), and whether the
IO routines are designed for typed files. I wasn't able to run the Turbo
Benchmark on the 486, for some reason it never terminated.
TrigLog Benchmark
=================
The TrigLog benchmark tests the efficiency of a compilers trigonometric and
logarithmic floating point functions.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.05 1.05 1.05 1.04 1.05 1.04
BenchTime : 60.30 60.03 60.09 60.31 60.75 60.03
Null loops : 3430 2479 2467 3716 2752 3703
Bench loops : 14 658 670 39 44 241
LoopOverhead : 0.00 0.28 0.29 0.01 0.02 0.07
TotalTime : 60.30 59.75 59.80 60.30 60.73 59.96
Loops per second : 0.23 11.01 11.20 0.65 0.72 4.02
Turbo Pascal does well in this benchmark, again suggesting that the
non-standard 6-byte REALs are giving it an edge. Again TopSpeed Pascal wins
under the coprocessor.
Whetstone Benchmark
===================
The Whetstone Benchmark is a synthetic benchmark for testing floating point
performance. It contains a number of weighted floating point expressions
and operations on arrays and records containing REALs. The Whetstone can
give an indication of an applications performance in real-world floating-
point intensive applications.
TopSpeed Turbo
NOx87 EMU x87 NOx87 EMU x87
=================== ===================
NullTime : 1.05 1.05 1.04 1.04 1.04 1.05
BenchTime : 72.72 60.36 60.14 61.96 61.52 60.09
Null loops : 3399 2391 2560 3739 2780 4020
Bench loops : 5 148 167 13 14 82
LoopOverhead : 0.00 0.06 0.07 0.00 0.01 0.02
TotalTime : 72.72 60.30 60.07 61.96 61.51 60.07
Loops per second : 0.07 2.45 2.78 0.21 0.23 1.37
Turbo Pascal does well in this benchmark, again suggesting that the 6-byte
REALs are giving it an edge. TopSpeed runs twice as fast under the
coprocessor though.
EXECUTABLE SIZE
===============
One measure of a compilers abilities that is often quoted is the size of
the resulting executable. While small executables are desirable, a small
executable does not always indicate a better compiler. With small programs
such as these benchmarks it is conceivable that a large percentage of the
programs size is made up of routines from the run-time library. These
routines may have been implemented differently for each compiler because
each vendor may have slightly different goals. For example, implementing
a Pascal run-time library for ISO conformance may result in a lager IO
library than a simple DOS IO interface. For this reason you shouldn't
assume that if a compiler generates a smaller executable from a small
source file that it will generate a smaller executable for huge source
files. The larger the amount of source code the more the size is reliant
on the efficiency of the compiler and linker. Furthermore some libraries
are implemented largely in assembler, whilst others are implemented in a
high level language. This often has some size penalty particularly in
small programs.
TopSpeed Turbo
======== =====
ackerman.exe 19743 6160
dhry.exe 21274 8608
fbench.exe 23700 11632
fibonacc.exe 19649 5824
float.exe 19647 6288
gamm.exe 20991 8112
imath.exe 19633 5984
rmath.exe 19634 5856
savage.exe 20436 7072
sieve.exe 19691 6096
store.exe 22353 6416
tmath.exe 20113 6992
tscrn.exe 19671 5792
whet.exe 22979 10688
whetchk.exe 21395 10224
CONCLUSION:
===========
These benchmarks illustrate 3 advantages that Turbo Pascal has over TopSpeed
Pascal: the non-standard 6-byte REALs offer a performance advantage on
machines which do not have a coprocessor and non-486DX machines; for small
programs Turbo Pascal produces smaller programs; simple file IO appears to be
faster. However TopSpeed Pascal uses standard floating point representations
that match those used by the coprocessor. There are no restrictions as to
which floating point representation you may use in your TopSpeed Pascal
program. The TopSpeed emulator correctly uses x87 code when run on a 486DX.
If you have a machine with a coprocessor, floating point programs may run up
to 5 times faster with TopSpeed Pascal (See RealMath). There are a number of
ways of controlling file IO in order to speed up operations. Although for
small programs TopSpeed Pascal produces larger executables than Turbo, this
is not always the case. We have had reports of programs larger than 400K
under Turbo shrinking to 250K under TopSpeed! This suggests that the
overhead is introduced by the run-time library.
Turbo Pascal's apparent advantages are short lived. Most people are using
larger machines now and applications are growing. The size difference for
small programs is going to be of little concern to most users and developers,
however program shrinkage for larger applications is still an issue. Most
people running heavily numeric applications own a machine with a coprocessor.
I may be biased, but I think that what little information one can safely
gleen from these benchmarks shows TopSpeed Pascal to be the compiler of
choice for most serious developers. If anybody wishes to provide
substantiated statistics from real world applications which show Turbo Pascal
against TopSpeed Pascal or Modula-2, I'd be happy to include them in this
paper.
Sean Wilson, Clarion Software 11 August 1992